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Abstract 

Graphics cards for personal computers have recently undergone 
a radical transformation from fixed-function graphics pipelines to 
multi-processor, programmable architectures. Multi-processor ar- 
chitectures are clearly advantageous for graphics for the simple 
reason that graphics computations are naturally concurrent, map- 
ping well to stateless stream processing. They therefore parallelize 
easily and need no random access to memory with its problematic 
latencies. 

This paper presents Vertigo, a purely functional, Haskell-embedded 
language for 3D graphics and an optimizing compiler that gener- 
ates graphics processor code. The language integrates procedural 
surface modeling, shading, and texture generation, and the com- 
piler exploits the unusual processor architecture. The shading sub- 
language is based on a simple and precise semantic model, in con- 
trast to previous shading languages. Geometry and textures are also 
defined via a very simple denotational semantics. The formal se- 
mantics yields not only programs that are easy to understand and 
reason about, but also very efficient implementation, thanks to a 
compiler based on partial evaluation and symbolic optimization, 
much in the style of Pan [2]. 

Haskell's overloading facility is extremely useful throughout Ver- 
tigo. For instance, math operators are used not just for floating 
point numbers, but also expressions (for differentiation and com- 
pilation), tuples, and functions. Typically, these overloadings cas- 
cade, as in the case of surfaces, which may be combined via math 
operators, though they are really functions over tuples of expres- 
sions on floating point numbers. Shaders may be composed with 
the same notational convenience. Functional dependencies are ex- 
ploited for vector spaces, cross products, and derivatives. 



*The work reported in this paper was done while the author was at 
Microsoft Research. 
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1 Introduction 

There has recently been a revolution in processor architecture 
for personal computers. High-performance, multi-processor, data- 
streaming computers are now found on consumer-level graphics 
cards. The performance of these cards is growing at a much faster 
rate than CPUs, at roughly Moore's law cubed [4]. Soon the com- 
putational power of these graphics processing units ("GPUs") will 
surpass that of the system CPU. 

Some common appUcations of GPUs include geometric transforma- 
tion, traditional and alternative lighting and shading models ("pro- 
grammable shaders"), and procedural geometry, textures, and ani- 
mation. 
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The accepted programming interfaces are assembler and C-like 
"shading languages", having roots in RenderMan's shading lan- 
guage [5, 14, 3, 10]. This is an unfortunate choice, because the 
computations performed are naturally functional. In fact, these C- 
like languages are only superfically imperative. This paper offers a 
functional alternative to existing shading languages that simplifies 
and generalizes them without sacrificing performance. 

GPU architectures are naturally functional as well. The low-level 



execution model is programs acting in parallel over input streams 
producing new output streams with no dependence between stream 
members, i.e., pure functions mapped over lists. Pipelining is used 
between the different processor types (vertex and pixel processors 
in the current architectures), much like compositions of lazy stream 
functions. 

The main contributions reported in this paper are as follows: 

• Optimized compilation of a functional language to modem 
graphics hardware. 

• A simple and practical embedding of parametric surfaces def- 
inition and composition (generative modeling [12]) in a func- 
tional programming language. (See also [6].) 

• A simple but powerful semantic model for shading languages, 
with direct implementation of that model. 
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Figure 1. Vertex shader model 



2 Why Functional Graphics? 

Functional programming is a natural fit for computer graphics sim- 
ply because most of objects of interest are functions. 

• Parametric surfaces are functions of type li} ^ to be 
evaluated over a subregion of !}(^. 

• Implicit surfaces and spatial regions are functions of type 
"K^ ^ "E^ where surface, inside and outside are distinguished 
by the sign of the resulting real value. Planar regions are func- 
tions of type Qi} !K_. 

• Height fields, as used to represent a class of geometry as well 
as bump mapping and displacement mapping, are functions of 
type ^ 

• Spatial transformations (e.g., affines and deformations) are 
functions of type ^ k? for 3D or ^ ^ for 2D. 

• Resolution-independent images are functions of type 
3(2 -, Color. 

• 2D & 3D animations and time-varying values of all types are 
functions from 

• Lights of all kinds are functions from points in to the di- 
rection and color of the light delivered to that point. 

• Shaders are functions from view information (ambient color, 
eye point and set of active lights) and surface point informa- 
tion (color, location and surface derivatives). 

Computer graphics math makes extensive use linear algebra, and in 
particular matrices for representing linear, affine, or projective spa- 
tial transformations. There are actually competing conventions for 
transforming vectors with matrices using matrix multiplication. In 
one, the matrix is on the left and the vector is a column, while in 
the other, the vector is a row and the matrix is on the right. Trans- 
formations are composed by multiplying the matrices, taking care 
with the order, consistently with the pre-multiply or post-multiply 
convention. With a functional foundation, one can simply let the 
transformations be functions that happen to be linear, affine or pro- 
jective, or might be arbitrary spatial deformations, such as bends, 
twists, or tapers. 

3 Graphics processors 

Vertigo targets the DirectX 8. 1 vertex shader model shown in Fig- 
ure 1, which is taken from [9]. This model and a multiprocessor 



implementation are described in [8]. This unit is replicated, typ- 
ically with four or eight instances. Every register is a quadruple 
of 32-bit floating point numbers (a "quad-float"). Every "vertex" 
is represented by up to 16 registers, having user-specified seman- 
tics, e.g., coordinates of a 3D point, its normal vector, one or more 
sets of texture coordinates, etc. Vertex and constant registers are 
read-only, and the output registers are write-only. Temporary reg- 
isters may be written and read during a vertex computation but are 
cleared before each new vertex. That property is important, because 
it means that (a) several vertex processors may run in parallel, and 
(b) vertex processing is simply mapping of a pure function over a 
vertex stream. 

The input vertex stream is parceled out to the vertex processors, 
and the resulting output is reassembled and fed to the pool of pixel 
processors, which are not discussed in this article. 

An important aspect of this model is that random memory access 
is extremely limited (to these registers). Large amounts of vertex 
data are accessed by streaming from video RAM rather than being 
accessed randomly system. 

One reason CPUs and functional programming fit together is that 
GPUs inherently compute staged functions. Vertex computations 
depend on "constant" registers and on vertex registers. Values held 
in the constant registers may be set at most once per stream of ver- 
tices, being held constant among vertices in a stream. Typically 
these constant registers contain both actual constants and time- 
varying values. Thus any vertex computation may be cast as a cur- 
ried function: 

vc : : MeshData ( VertexData Vout) 

Given such a computation vc, mesh data md, and a stream svd of 
vertex data, the vertex processor hardware simply computes 

map (vc md) svd 

4 Geometry 

3D graphics cards mainly render vertex meshes, with each contain- 
ing information such as 3D location, normal vector, and texture co- 
ordinate vertices. The new breed of graphics processors, being pro- 
grammable, are very flexible in the type of streams they can operate 
on and what computations they can perform. Vertigo concentrates 
on synthetic (or "procedural") geometry, from which vertex meshes 



are extracted automatically and efficiently. The main type of inter- 
est is a (parametric) surface, which is simply a mapping from to 

type Surf = 2(2 ^ 2(? 

type = {11, 11) 
type -K^ = H., %) 

By convention, during display, surfaces will be sampled over the 
2D interval [-1/2,1/2] x [-1/2,1/2]. 

At this point, the reader may safely interpret ^ as synonymous with 
Float. The actual meaning of is expressions over Float, so that 
the implementation can perform optimizing compilation (Section 6) 
and symbolic differentiation (Section 8). 

Now one can start defining surfaces directly. For instance, here are 
a unit sphere and a cylinder with a given height and unit radius. 

sphere :: Surf 

sphere (u, v) = {cos 6 • sin (]), sin 6 • sin (|), cos (])) 
where 6 = 2 • Ji • m 

(() = TT-V 

cylinder :: "R^^ Surf 
cylinder h {u, v) = {cos 6, sin 6, h- v) 
where Q = 2-it- u 

Note that as u and v vary between — 1/2 and 1 /2, 6 varies between 
— ir and n, while ^ varies between — 3t/2 and n/2 (south and north 
poles). 

More powerfully, using higher-order functions, we can construct 
surfaces compositionally, as in the method of generative model- 
ing [12, 11]. The next several examples introduce and demonstrate 
a collection of useful combinators for surface composition. 

4.1 Height fields 

"Height fields" are simply functions from 1^ to and may be 
visualized in 3D in the usual way: 

type HeightField = li} ^ 1^ 

hfSurf :: HeightField Surf 

hfSurf field {u, v) = {u, v, field (m, v)) 

A simple definition produces ripples: 

ripple : : HeightField 
ripple = sinU o magnitude 

Here sinU is a convenient variant of the sin function, normalized 
to have unit period. (The typeset code examples in this paper use 
an infix "•" operator for regular multiplication and for scalar/vector 
multiplation introduced below.) 

cosU, sinU :: 2^ 
cosUQ = cos (2 -31 -9) 
sinUQ =sin (2 -31 -6) 

Now let's add the ability to alter the frequency and magnitude of 
the ripples. This ability is useful in many examples, so abstract it 




Figure 2. rippleS 5.7 0.1 



out: 

freqMag :: Surf -> (2^, 2^) ^ Surf 
freqMagf ijreq, mag) = {mag-) of o (freq-) 

Combining, we get the surface shown in Figure 2} 

rippleS :: 'R} ^ Surf 

rippleS = hfSurf o freqMag ripple 

The definition oi freqMag uses operators to scale the incoming l(} 
and outgoing 1^ points. These operators belong to the vector space 
type class defined as follows, for a scalar type s and a vector space 
V over 5. (The actual operator for scalar multiplication is "*'".) 

class Floating s VectorOf s v\v ^ s where 

(•) ::.v^v^v 

(<•>) :: V ^ V ^ s — dot product 

The general type of freqMag then is as follows. 

freqMag :: {VectorOf si vi, VectorOf so vo) 
=> {vi vo) {si, so) {vi vo) 

The constraints here say that the types vi and vo are vector spaces 
over the scalar field and so, respectively. 

As another surface example, here is a wavy "eggcrate" height field: 

eggcrate :: HeightField 
eggcrate {u, v) = cosU u- sinU v 

The definition of eggcrate (u, v) above fits a pattern: the result 
comes from sampling one function at u and another at v and com- 
bining the results. Since this pattern arises in other examples, we 
abstract it out. 

eggcrate = cartF (•) cosU sinU 

cartF :: {a ^ h ^ c) ^ {u ^ a) ^ (v h) 

{u, v) c 
cartF opf g {u, v) —f u 'op' g v 



The GUIs shown in this paper are automatically generated 
based on the type of a parameterized surface and a small specifi- 
cation of the labels and ranges for parameter sliders. 




Figure 3. eggcmteS 2.6 0.23 



Now add control for frequency and magnitude of the waves, to get 
the surface shown in Figure 3. 

eggcrateS :: li} ^ Surf 

eggcrateS = hfSurf ofreqMag eggcrate 



4.2 Sweeps 

Another surface composition technique is using one curve to 
"sweep" another. 

type Curve2 = 'K.-' 1<} 
type Curves = ^ 

sweep :: Curvej, Curve-i, Surf 

sweep basis scurve {u,v) = basis u + scurve v 

Or more succinctly, 

sweep = cartF (+) 
For instance, a cylinder is a circle swept by a line. 

cylinder h = sweep (addZ circle) (addXY (h-)) 

The helper functions addXY and addZ simply increase the dimen- 
sionality of a value in !^ or 1^ respectively, inserting zeros. For 
convenience, they actually apply to functions that produce or 

addX, addY, addZ :: {a -^{a^ 
addX = lifti {X{y,z)^iO,y,z)) 
addY = lifti {X{x, z) ^ (x, 0, z)) 
addZ = lift\ y) {x, y, 0)) 

addYZ, addXZ, addXY :: (a ^ ^ {a ^ 
addYZ = lifti (Xx {x, 0, 0)) 
addXZ = lifti (^>' ^ (0, y, 0)) 
addXY = Ufti {Xz (0, 0, z)) 



The handy "lifting" functionals are defined as follows: 

lifti hfl X = h(fix) 

lifti h fx fix = hifix) (f2 x) 

lifti hfl flfix = h (fi x) (f2 x) (/3 x) 

We can define the circle curve out of lower-dimensional functional 
pieces as well:^ 

circle :: Curve2 

circle = cosU 'pairF' sinU 

pairF :: {c ^ a) ^ (c b) ^ {c ^ {a, b)) 
pairF = lift2 (,) 

4.3 Surfaces of revolution 

Another commonly useful building block is revolution of a curve. 
To define revolution, simply lift the curve into 1^ by adding a zero 
Z coordinate, and then rotate around the Y axis. 

revolve : : Curve2 Surf 

revolve curve (u, v) = rotY (2 ■ 71 ■ u) {addZ curve v) 

The function rotY is an example of a 3D spatial "transform". Tra- 
ditionally in computer graphics, transforms are restricted to linear, 
affine, or projective mappings and are represented by matrices. In 
a functional setting, they may more simply and more generally be 
functions: 

type Transformi = l(^—> 
type Transform2 = !K} !H} 
type Transform^ = "H} ^ "S^ 

To rotate a 3D point about the Y axis, it suffices to rotate [x, z) in 
2D and hold y constant: 

rotY :: Transform^ 
rotY e = onXZ {rotate 6) 

rotate :: Transform2 
rotate 6 (x, y) = {x-c — y-s,y-c-\-x-s) 
where c = cos 9 
s = sin 6 

onXY , onYZ, onXZ :: Transform2 Transform^, 
onXYf{x,y,z) = {^,y',z) 

where (V,/) =/ (x, y) 
onXZf {x,y,z) = {x',y,z') 

where (V, z') =/ {x, z) 
onYZf {x,y,z} = {x,y',z') 

where (y,z')=/(y,z) 

Spheres and cylinders are surfaces of revolution: 

sphere = revolve semiCircle 

cylinder h = onZ {h-) o revolve {Xy (1, y)) 

A semi-circle is just a circle sampled over half of its usual domain 
([-1/4, 1/4] instead of [-1/2, 1/2]): 

semicircle = circle o (/2) 



Building higher-dimensional shapes out of lower ones is one of 
the themes of generative modeling [12, 11]. 




Figure 4. torusFrac 1.5 0.5 0.8 0.8 




Figure 5. eggcrateCylinder 3.8 4.0 0.23 



The torus is a more interesting example. It is the revolution of a 
scaled and offset circle. 

torus :: "R^^ Surf 

torus sr cr = revolve {const {sr. 0) + const cr ■ circle) 

Note that the addition and multiplication here are working directly 
on 2D curves, thanks to arithmetic overloading on functions and on 
tuples. 

instance Num b Num {a b) wtiere 

(+) = lifti (+) 

(•) = lifti {■) 

negate = lift\ negate 
fromlnteger = const o fromlnteger 
— etc. 

To make the example more interesting, add parameters to scale 
down the surface parameters u and v. The result is an incomplete 
torus, as in Figure 4. 

torusFrac sr cr cfrac sfrac — 
torus sr cr o {-{cfrac, sfrac)) 

4.4 Displacement surfaces 

As a final example of surface construction, Figure 5 results from 



"displacing" a cylinder using the eggcrate height field. 

eggcrateCylinder hfm = 
displace {cylinder h) {freqMag eggcrate fm) 

The definition of displacement is direct: 

displace :: Surf HeightField Surf 
displace surf field = surf + field ■ normal surf 

Note that the surface, its normal, and the height field are all sampled 
at the same point in !}(^. The displacement vector gets its direction 
from the surface normal and its distance from the height field. 

Normals are computed by taking the cross products of the partial 
derivatives. 

normal :: Surf — > Surf 

normal = normalize o cross o derivative 

As described in Section 8, Vertigo computes derivatives exactly, not 
through numeric approximation. 

Vector normalization scales to unit length, and is defined indepen- 
dently of any particular vector space. 

normalize : : VectorOf j; v v ^ v 
normalize v = v/ magnitude v 

magnitude : : VectorOf s v ^ v ^ s 
magnitude v = sqrt (v<->v) 

The type of normal is actually more general: 

normal :: {Derivative c vec vecs 
, Cross vecs vec 
, VectorOf s vec) 
(c — > vec) {c vec) 

The constraints mean that (a) the derivative of a c ^ vec function 
has type c —f vecs, (b) the cross product of a vecs value has type 
vec, and (c) the type vec is a vector space over the scalar field s. In 
the Surf case, j = 3^, c = 3^^, vec = 1^, and vecs = (3(,\ 1^^). 

The inferred type of displace is also more general than given above. 

displace : : {Num (c vec) 
, Cross vecs vec 
, Derivative c vec vecs 
, VectorOf s vec 
, VectorOf (c s) {c vec)) 
=> (c ^ vec) (c s) ^ {c vec) 

For instance, the cross product of a single 2D vector {x, y) is the 2D 
vector {y, — x), and the displace function may be used to displace 
one 2D curve with a "2D height field" (of type IQ. In this 
case, s = !}(^, c = vec = !}(^, and vecs = !}(^. 

5 Shading 

Shading languages began with Cook's "shade trees", which were 
expression trees used to represent shading calculations. The most 
successful shading language has been RenderMan's [5, 14]. 

One interesting aspect of RenderMan's shading language is that 
the data it uses comes in at different frequencies (surfaces patches, 
points on surfaces, and light sources) . As an example, here is a def- 



inition of a diffusely reflecting surface [14, page 335] (simplified). 

surface 

matte (float Ka, Kd) 
{ 

Ci = Cs * (Ka*ambient 0 + Kd*dif fuse (N) ) ; 

} 

In explanations of this shading language, invocations of a param- 
eterized shader like matte are referred to as "instances", and the 
parameters like Ka and Kd are referred to as "instance variables". A 
given instance instance is "called" perhaps thousands or millions of 
times for different sample points on a surface. These "calls" to a 
shader instance supply information specific to surface points, such 
as surface normal (N) and surface color (Cs). "It may be useful to 
think of a shader instance as an object bundling the functionality of 
the shading procedure with values for the instance variables used by 
the procedure" [14, Chapter 16]. Shader calls read from and write 
to special global variables. 

There is a third frequency of evaluation as well, namely the contri- 
bution of several fight sources per surface point. Here is a defini- 
tion of a diffuse lighting function, coiimionly used in shader defini- 
tions [14, Chapter 16]. 

color 

diffuse (point norm) 
{ 

color C = 0; 

unitnorm = normalize (norm) ; 
illuminance ( P, unitnorm, PI/2 ) 

C += CI * normalize (L) .unitnorm; 
return C; 

} 

The illuminance construct iterates over light sources, combining 
the effects of its body statement, using light-source-speciflc values 
for fight color (CI) and direction (L). 

5.1 The essence of shading languages 

To create a semantic basis for shaders, consider the information that 
a shader has access to and what it can produce. Some information 
comes from the viewing environment, some comes from a point on 
the surface, and some from a light source relative to that point. 

A viewing environment consists of an ambient light color, an 3D 
eye position, and a collection of light sources: 

type ViewEnv = {Color, , [Light]) 

Information about a surface at a point includes the point's posi- 
tion, a pair of partial derivatives (each tangent to the surface at that 
point), and an intrinsic color: 

type SutfPt = (St*, S^*), Color) 

For our purposes, a light source is something that provides light in- 
formation to every point in space (though to some points it provides 
blackness), independent of obstructions.^ 

type Light ='H^ Lightlnfo 

Light information defivered to a point consists simply of color and 

^In a more sophisticated model, a light source would probably 
also take into consideration atmosphere and solid obstructions. 



direction. Any given shader will decide what to do with this infor- 
mation. Attenuation and relation of light position (if finitely distant) 
to surface position are aheady accounted for. 

type Lightlnfo = {Color, N3) 

For example, here are definitions for simple directional and point 
lights (without distance-based attenuation): 

dirLight :: Color N-i^ Light 
dirLight col dir = const {col, dir) 

pointLight : : Color 1^ ^ Light 
pointLight col lightPos p = 
{col, normalize {lightPos — p)) 

There are three different kinds of shaders, corresponding to the 
three stages of information used in the shading process. "View 
shaders" depend only on viewing environment; "surface shaders" 
depend additionally on surface point info; and "light shaders" de- 
pend additionally on a single light info. View shaders are not par- 
ticularly useful, but are included for completeness. 

Rather than restricting to a single resulting value type like Color, it 
wiU be useful to generalize to arbittary result types :^ 

type VShader a = ViewEnv — » a 

type SShader a — VShader {SutfPt a) 

type LShader a = SShader {Lightlnfo a) 



5.2 A "shading language" 

Given the model above, one could simply start writing shaders as 
functions. Doing so leads to awkward-looking code, however, due 
to the explicit passing around and extraction of view, surface point, 
and fight information. This explicit passing is not necessary in the 
RenderMan shading language thanks to the use of global variables. 
Fortunately, we can keep our function-based semantic model and 
remove the notational clutter. The trick is to build shaders using 
higher-order building blocks, and define overbadings.^ 

First define extractors that access information from the view envi- 
roimient: 

ca :: VShader Color ; ca {c, _, _) = c 
eye :: VShader 9^ ; eye (_, e, _) = e 
lights :: VShader [Light]; lights (_, _, Z) = / 

Similarly for surface point info: 

pobj :: SShader ; pobj _ (p, _, _) = p 

dp :: SShader {iClC); dp -{-,d,-)=d 
cs :: SShader Color ■,cs _(_,_, c) = c 

Using the full derivative (Jacobian matrix) dp, we can easily define 
the two partial derivatives by selection and surface normal vector 



^In the Renderman shading language, shaders do not have re- 
turn values at all, but rather assign to globals, and shaders are not 
allowed to call other shaders. There are also "functions", which 
return values and can be called by shaders and other functions. 

^ As discussed in Section 5.3, one could instead use impficit pa- 
rameters. 



by cross product. 

dpdu, dpdv :: SShader 'S^ 
dpdu e s =fst {dp e s) 
dpdv e s = snd [dp e s) 

n :: SShader Nt, 

n = normalize {cross dp) 

Light shaders need extractors as well: 

cl :: LShader Color ; cl - - {c, _) = c 
I :: LShader Dir3E; I _ _ {_. d) = d 

It is easy to precisely define a counterpart to RenderMan's 
illuminance construct. To turn a light shadcr into a surface shader, 
simply iterate over the light sources in the viewing environment, 
apply to the surface point to get the required light information, and 
sum the results.* 

illuminance :: Num a => LShader a —^ SShader a 
illuminance Ishader v@(-, _, Is) s@{p, _, _) = 
sum [Ishader v s {light p) \ light ^ Is] 

Sometimes we need to nux light and surface shaders, which we do 
by Ufting a surface shader into a light shader. For instance, the 
dot product between normal vector and light direction is commonly 
used in shaders. 

ndotL :: LShader !^ 
ndotL = toLS n<->l 

The dot product here is on functions. 

The toLS function simply adds an ignored argument: 

toLS ssvs. = ssvs 

This function is actually overloaded to work on view shaders and 
non-shaders as well, adding one or two ignored arguments, respec- 
tively. Similarly, there are overloaded toES and toSS functions. 

5.3 Implicit parameters 

We also implemented the shading language using implicit parame- 
ters [7], The following definitions describe dependencies on view, 
surface point, and light information, abstracting out the details: 

type ViewDep a = 

{lea :: Color, leye :: 1^ , 1 lights :: [Light]) ^ a 
type SurfDep a = 

{Ics :: Color, Ipobj :: "R^ , Id :: {^ , ^)) a 
type LightDep a = {Id :: Color, 11 :: l(}) =^ a 

type VShader a = ViewDep a 

type SShader a = VShader {SurfDep a) 

type LShader a = SShader {LightDep a) 

This formulation eliminates the need for toLS and the lift J func- 
tions used in the explicit function formulation. It is, however, rather 
demanding of the type system. The original implementations of 
implicit parameters in GHC did not support type definitions Uke 

* A more sophisticated renderer might use a different set of Ught 

sources, synthesized from the environment's lights, simulate area 
light sources and inter-object reflection and occlusion. 



ViewDep, SurfDep, and LightDep, requiring instead that all of the 
implicit parameters be mentioned explicitly at every use. For ex- 
ample, instead of the simple types for n and ndotL above, we would 
have something like the following. 

n::{7d::{fK^,il^))^N3 
n = normalize {crossid) 

ndotL :: :: (S??, 3^.3), 11 :: 1^) 1^ 
ndotL = n<->ll 

Note how these implementations of n and ndotL show through in 
their types. It gets worse from there: as more and more pieces 
of the view, surface point, and light contexts are used, the explicit 
lists of implicit parameters grow. Fortunately, GHC's type checker 
was improved to handle definitions like ViewDep and the others, 
so we were able to hide all of the implicit parameters. The actual 
definitions look like the following. 

dp :: SShader (3^?, 3^3) 
dp = Idp 

n :: SShader 'R} 

n = normalize {cross dp) 

ndotL :: LShader 
ndotL = n<->l 

The improvements made to GHC for supporting such convenient 
definitions are not present in Hugs, which we also wanted to use, 
so for now, Vertigo has both the expUcit and impUcit parameter 
approaches. Since the latter is more convenient, we will use it for 
the examples in the next section. 



5.4 Sample shading specifications 

Given this simple shading language, we can define some common 
shaders. The simplest (other than pure ambient or pure intrinsic) is 
pure diffuse. It uses n<->l to scale the light color, and sums over 
all light directions I. 

diffuse :: SShader Color 

diffuse = illuminance {ndotL ■ cl) 

We then make a weighted combination of pure ambient (ca) and 
diffuse: 

ambDiff : : 1i} SShader Color 

amhDiff {ka, kd) — cs ■ {ka ■ ca + kd ■ diffuse) 

To make surfaces look shiny, we turn to specular shading, which is 
independent of intrinsic color. 

specular :: !^ — » SShader Color 

specular sh = illuminance {{vdotR**sh) ■ cl) 

vdotR : : LShader 

vdotR = eyeDir< >reflect I n 

eyeDir : : SShader No, 

eyeDir = normalize {eye—pobj) 



The pictures in Section 4 are made using a weighted combination 
of ambient, diffuse, and specular shading. 

basic :: !J{^ Shader Color 
basic (ka, kd, ks, sh) = 
ambDiff (ka, kd) + ks ■ specular sh 

Many other shaders may be defined, e.g., brushed metal. 

6 The GPU compiler 

Vertigo is implemented as an optimizing compiler, in the style of 
Pan [2]. The main difference is that Vertigo targets a modem graph- 
ics processor architecture, rather than a general purpose CPU in- 
struction set. 

The target GPU architecture and instruction set have some unusual 
traits that make it challenging and interesting to compile into correct 
and efficient code. 

• Most operations work on quad-floats. 

• Operand registers may be negated and/or "swizzled" for free. 
Swizzling is extraction and rearrangement of scalar compo- 
nents to form a new vector, possibly omitting or replicating 
components. The same component may be used more than 
once to form an operand. 

• There are no literals in the assembly code. All literals must be 
loaded into constant registers (also quad-floats). 

• At most one constant register and one vertex register can be 

accessed per instruction. 

• There is no conditional instruction. 

• There is a multiply-add instruction (a-b + c). 

• There are no trig functions, so they must be approximated. 

6.1 Front end 

The front end of the Vertigo compiler is similar to that of Pan [2], 
with the following main differences: 

• The data types supported are 1- to 4-tuples of 32 bit floats. 

• The primitive operations are altered to target GPUs. 

• Many of the algebraic rewrites use associative-commutative 

matching. 

The programming interface is a set of statically typed definitions 
that make calls to a layer of dynamically typed "smart construc- 
tors", as in Pan [2]. The type !^ used above refers to statically 
typed, float-valued expressions. 

The smart constructors perform bottom-up algebraic simplifications 
and build expressions, which may be Uterals, variables, applications 
of primitive operators, or let-bindings: 

data Exp = LitVec Vector 

I Var Id Type 
I Apply Op [Exp] 
I Let [{Id, Exp)] Exp 



The set of primitive operators reflect the GPU instruction set: 

data Op = Add \ Mul \ Mad \ Max \ Min \ Sge \ Sit 

I Mov 

I Rep I Rsq I Log I Exp 

I Dp3 I Dp4 

I Expp I Logp I Frc 

I Negate \ Swizzle [Int] \ MkVec 

I Frac 

I Cos I Sin 

Notes: 

• The first line (add, multiply, multiply-add, max, min, >, and 
<) contains SIMD operations: The last two return a vector 

containing floats that represent booleans, using zero for false 
and one for true. All are binary except Mad, which is ternary 
(a-b + c). 

• Mov is the unary identity operator. 

• The third line (l/x, 1/ ^/x, logix, and 2-*) contains operations 
that work only on scalar values (presumably because SIMD 
execution would use too much time or silicon). 

• The fourth line contains 3D and 4D dot product operations, 
computing scalar results. 

• Negation and swizzling are pseudo-operations. They are inte- 
grated into each generated instruction but are logically sepa- 
rate at this level. Vector construction is also a pseudo-op. 

• The Sin and Cos operators are introduced but replaced later by 
approximations. The main reason is to allow computation of 

derivatives before approximation rather than after, resulting in 
a more precise approximation of the derivative. 



6.2 Smart constructors 

The smart constructors invoked by the statically typed interface dif- 
fer from those in Pan because of the target architecture. 

For instance, as the only comparators are > and <, other boolean 
operators must be synthesized. For clarity, we state the translations 
in concrete syntax, though the actual implementation does pattern 
matching on the Exp. 

= 62 = ei > 62 A 62 ^ ei 
e\ ^ 62 = ei < 62 V 62 < e\ 

a > b = b < a 
a<b = b>a 

not (ei < £2) = ei > 62 
not {e\ > 62) = ei < 62 

Although the statically typed layer has a Bool type, the GPU archi- 
tecture simulates booleans via floating point numbers, using 1 .0 for 
True and 0.0 for False. Thus, 

not c = 1 — c 



(A) = min 

type Vector = [Float] (V) = 



type Id = String — variable name 



if c then a else b = c-a + not c ■ b 



Note in this last definition that if-then-else is strict.' 

6.3 Literal extraction 

Because the target instruction set does not support literals, the com- 
piler must extract literals and allocate them into the constant register 
set. Extraction proceeds in three phases: discover the literals, pacA: 
efficiently into a constant register file, and replace the Uterals with 
variables (possibly swizzled and negated). 

extractLiterals :: Int —> Exp —> {Exp, RegFile) 
extractLiterals numRegs exp = 

{replace regs exp, regs) 

where 

regs = pack numRegs {discover exp) 

discover :: Exp [Vector] 

pack :: Int ^ [Vector] RegFile 

replace :: RegFile Exp Exp 

type RegFile = [Vector] 

6.4 Codegen normal form 

In preparation for code generation, the Vertigo compiler rewrites 
expressions into "codegen normal form" (CNF) designed to reflect 
what the processor can do. 

CNF is a subset of the Exp type such that: 

• There are no literals. 

• Operators other than MkVec may only be applied to only to 
"operands", which are swizzled and possibly negated vari- 
ables. 

• Swizzling, negation, and variables show up only in these 
operands. (If necessary, a Mov (identity) operator application 
is inserted.) 

Variables will correspond to readable registers, possibly swizzled 
for layout. Swizzling and negation get rewritten away whenever 
possible, by using distributive properties and pushing them into 
operand position where they cost nothing. 

For negation, the following distributive properties are used:^ 



-(-«) = 




-{a + h) = 


l-a) + {-b) 


— {max a b) = 


min {—a) (— i 


— {min a b) = 


max {—a) {— 


-{a-b + c) = 


a-{-b) + {- 


-{a-b) 


{-a)-b 


-{1/a) = 


!/(-«) 


-{a<->b) = 


{-a)<->b 


-{e\, e„) 


= {-e\, 


—{e.swiz) 


= {—e).swiz 



The last rule refers to negations of swizzled expressions. Here swiz 

'More modem GPU architectures do support booleans and non- 
strict conditionals. 

^These rewrites do not need to be applied recursively. One ap- 
plication suffices to move the negation to operand position. Recall 
that <•> is dot product. 



refers a sequence of of x, y, z, and w components (with n compo- 
nents if e :: IC)- 

Similarly, there are helpful properties for rewriting swizzlings. For 
all SIMD operations op, 

{opei ... en).swiz = op {e\.swiz) ... {e„.swiz) 

SwizzUngs of explicit vector constructions get swizzled syntacti- 
cally, e.g., 

{a, b, c).xzyz = {a, c, b, c) 

Composed swizzles are composed syntactically, e.g., 

{e.yzw).yx = e.zy 

When a negation or swizzling cannot be pushed into an existing 
operator, we simply introduce a new identity operator {Mov) to push 
it into, which will cost an additional instruction. 

CNF conversion also turns combinations of multiply and add into 
single Mad appUcations. 

6.5 Assembly language modeling 

An assembly program is simply a list of instructions. All instruc- 
tions are operator applications (even Mov) and contain a comment, 
in which the compiler inserts a binding in CNF. 

type Aim = [Instr] 

data Instr = PrimOp Op Dest [Source] String 

A register has a register class and index and a friendly name 

data RegClass = Regin | RegConst | RegTemp 
I RegAddr ] RegOut 

data Reg = Reg RegClass Int String 

Source registers may be swizzled and negated. The register may 
not be an output. 

data Source = Source NegSwiz Reg 

data NegSwiz — NegSwiz Bool Swizzle 
type Index = Int 
type Swizzle = [Index] 

Each destination has a register and a layout saying which floats 
within the register get used. The register may not be an input. 

data Dest = Dest Reg Layout 
type Layout = [Index] — distinct 

6.6 Code generation 

Given an expression in CNF, code generation is fairly straightfor- 
ward. Because GPUs have no random memory access, optimized 
register allocation is particularly important. The Vertigo compiler 
uses a simple functional implementation of the traditional dynamic 
programming technique [1]. 

A "code generator" tells how much free register space is needed (in 
floats) and how to generate code. The free space requirement will 



be used for argument reordering. 

type CodeGen = {Int, Gen) 

A Gen generates code for a given destination, an extra swizzle re- 
quired to accommodate tlie destination layout, a mapping from vari- 
ables to sources, and a pool of free temporary registers. 

type Gen = Dest — > Swizzle — > SourceEnv — » Pool — » Asm 



type SourceEnv = [{Id, Source)] — assoc list 



Code generation then maps an expression in CNF into a CodeGen: 

codegen :: CNF — » CodeGen 

Thanks to CNF, there are only two cases: (a) applications of oper- 
ators to optionally negated and swizzled variables, and (b) let ex- 
pressions. 

The application case is simple: for each argument, get the source 
bound to the variable in the environment, and compose the contex- 
tual negation and swizzle with the source's to form the instruction 
operand. Then use the destination layout as a mask for the result 
register. 

There is one tricky point arising from layout. Variables smaller 
than 1(1^ may require swizzling on write, which is not supported 
in general by the processor architecture. However, for almost all 
operations, a write swizzle can be correctly simulated by a combi- 
nation of write masking and argument swizzling. For SIMD ops, 
it suffices to swizzle each argument correspondingly. For scalar- 
producing ops, the same scalar result is written to all components 
of the output, so write swizzUng is just write masking. The remain- 
ing instructions write to all four components, and so do not pose 
a problem, unless a non-obvious layout were used. To handle this 
concern, all four-float allocations are given the identity layout, so 
that unpredictable layout swizzling cannot happen. If we were not 
so lucky with the instruction set, we could insert a Mov instruction 
that swizzled its argument as necessary. 

All register allocation is handled by the let case. For an n-ary let, 
there are n -f- 1 stages of evaluation: one for each right hand side 
and one for the body. The register use will be the maximum of the 
register uses over the n+1 stages. At each stage, we have to preserve 
the registers used to hold the results of previous stages. Since later 
stages have the added burden of preserving earlier results, we rear- 
range the bindings to put the less register-intensive bindings later, 
thus minimizing the maximum register usage over the stages. We 
cannot move the body, since it depends on all the bindings. 

codegen {Let bindings body) = {nr, gen) 
where 

{vars, types, cgs) = 
unzip3 {reorder {zip3 varso 

{map typeOf expso) 
{map codegen expso))) 

{varsQ, expsQ) = unzip bindings 

Reordering just sorts by decreasing register use: 
reorder = sortF (A,(_, _, {nr, _)) —nr) 



where sortF sorts based on a given key extractor function: 

sortF :: Ord k ^ {a -* k) [a] \a] 
sortF key = 
sortBy (ka b — » key a 'compare' key b) 

To determine the number of free registers needed for the let expres- 
sion, we need to know (a) the space tied up at each stage (sum of 
the sizes of values saved so far), and (b) the amount of free space 
needed for each binding. 

savedRs = scanll (+) ( 

0 : map typelsize types) 
nr = maximum { 

zipWith {+) {nrs+\-[nrb]) savedRs) 
{nrs, gens) — unzip cgs 
{nrb, genb) — codegen body 

The code generated for the let expression comes from code gener- 
ated for the bindings followed by code for the body. 

gen dest swiz env pool = 
asm 4f genb dest swiz env' pool' 
where 
{asm, env' , pool') = 
genBindings vars types gens env pool 

Code generation for bindings (genBindings) works simply by loop- 
ing through the (now reordered) bindings, allocating space from the 
temporary registers, and recursively generating code for the right 
hand sides. 

7 Sample optimizations 

In this section, we show examples to give a flavor of the kinds of 
optimizations that Vertigo performs in practice. 

7.1 Vector normalization 

It is cormnon to need to normalize vectors (i.e., scale them to unit 
length). One use is the construction of normals for shading (Sec- 
tion 5.2) and for displacement surfaces (Section 4.4). A painful 
tradeoff in graphics programming is whether utility functions like 
normal computation should normalize their vector arguments or as- 
sume them to have been normalized. Since execution speed is so 
important, the choice is often made to assume pre-normalization, 
so that the normalization can be avoided in a few cases. Unfortu- 
nately, this choice encourages one of the classic computer graphics 
programming bugs, which is failure to normalize before calling, ei- 
ther due to forgetting requirement or falsely assuming it to hold. 

With a sufficiently aggressive optimizer, one might hope to elimi- 
nate the pre-normalization requirement and still get efficient code 
when the actually argument has been normalized. That is, the 
compiler should perform the following optimization (interprocedu- 
rally).^ 

normalize {normalize v) = normalize v 

Rather than wire this domain-specific optimization into an other- 
wise domain-independent compiler. Vertigo performs simpler and 
more general rewrites. Given the definition of normalize from Sec- 



^There may be other, subtler, sources of redundant normaliza- 
tion. 



tion 4.4), normalize {normalize v) expands to 

normalize v/sqrt {normalize v<->normalize v) 

The sub-expression normalize v<->normalize v expands to 
{v/sqrt {v<->v))<->{v/sqrt (v<->v)) 

The following rewrites apply, with r and s ranging over scalars and 

u and V over vectors:^" 

v/.v =(l/.s-)-v 
(s-u)<->v =s-(u<->v) 
(lA).(lA) = l/(r.5) 
sqrt r ■ sqrt s = sqrt {r-s) 
sqrt {s- s) = s 

The result is 

(v<->v)/(v<->v) 
which simplifies to 1 . The overall expression then becomes 

normalize v/sqrt 1 
which simplifies to normalize v. 

As a particularly fortuitous example of this optimization and oth- 
ers, consider normal sphere. Without optimization there are 28 ad- 
ditions, 50 multipUcations, and four trigonometry operations. With 
optimization there are two additions, four multiplications, and four 

trigonometry operations. In fact, the result is identical to sphere 
itself, so the savings are compounded when rendering a sphere, 
which requires the surface and its normal. 



7.2 Cross products 

The previous example is architecture-independent. The definition 
of 3D cross products, used also in normal computation, gives rise 
to an example of architecture-specific optimization. 

(x) :: {Num j, VectorOf s {s, s, s)) 
^ {s, s, s) {s, s, s) {s, s, s) 
(ai,Z?i,ci) X (a2,*2,C2) = 

{b\-C2 — b2-c\,c\ • 02 — ci - a\, ai- b2 — 02 - b\) 

Automatic vectorization performs the following transformation for 
all SIMD operations op: 

{opaibi . . . , op 02 ^2 op On bn ■■■) = 

op (fill, 02, ■■■,an) {b\,b2, ...jbn) ... 

This rule applies four times in the definition of x (since subtraction 
is represented by addition and unary negation), yielding 

{ai,bi,ci) X {a2,b2,C2) = 
{bi,ci, ai) ■ (C2, «2, b2) - {b2, C2, 02) ■ {ci, ai, bi) 

Note that each constructed vector is a rcarrangment of one of the 
vector arguments to x . That fact means that the vectors are just 
swizzlings: 

u X V = u.yzx-v.zxy — v.yzx-u.zxy 



In CNF, the body of this definition is 

let q = v.yzx ■ u.zxy in 
mad (u.yzx) (v.zxy) {—q) 

Assuming u and v are allocated in the first three floats of registers 
rO and rl respectively, and the result should go into the first float 
of r2. Vertigo produces the following two instructions: 

mul r2.x, rl.yzx, rO.zxy 

mad r2.x, rO.yzx, rl.zxy, -r2.x 

8 Derivatives 

The derivative operator maps functions to functions. In general, 
the derivative of a function of type a |3 is a function of type a —> 
L(a;P), where "L(a;p)" means the linear subset of a — > P [13]. 
These linear maps are typically represented by real numbers, vec- 
tors, matrices, etc, depending on a and p. Because Vertigo uses 
these data representations rather than functions for derivative val- 
ues, derivative belongs to a multi-parameter type class: 

class Derivative a P Imap | cc p ^ Imap where 

derivative :: {a^ ^) ^ {a—> Imap) 

When a = L(a; p) can be represented by p for vector space 
types p. 

instance {DDeriv b. Suhstable b) ^ 
Derivative li^bb where 

If, for instance, P = !^-^, then the derivative values are represented 
as vectors of three scalar-valued "partial derivatives". 

When a = tti X • • • X a„, L(a; P) can be represented by yi x • • • x y„, 
where y,- represents L(a,-; P). 

instance {Derivative ai P Yi 

, Derivative a2 P Y2) =^ 
Derivative {a\ , a2) P (Ti , 72) where 

Similarly for triples, etc. When p happens to be a tuple type, the 
resulting derivative value representation is a tuple of tuples and co- 
incides with what is known as a "Jacobian matrix". 

The Suhstable type class contains types that support "substitution" 
of an expression for a variable. In the Vertigo implementation, the 
Suhstable instances are and tuples of Suhstable types. Differen- 
tiation of functions works by applying the function to one or more 
variable expressions, symbolically differentiating the resulting ex- 
pression(s), and turning the result back into a function that substi- 
tutes for the introduced variables. 

The DDeriv class supports differentiation with respect to variables. 
It includes ^ (expressions over Float), and tuples of DDeriv types. 
The case simply removes the statically typed wrapper, reveal- 
ing an underlying Exp (Section 6.1), where the actual, recursive 
symbolic differentiation algorithm takes place. It is critical for effi- 
ciency to memoize that algorithm, in order to avoid the usual prob- 
lem of time and space blow-up for symbolic differentiation. This 
differentiation algorithm is very simple (Figure 8). 

9 Further work 



The Vertigo compiler matches for these rules modulo associa- 
tivity and commutativity of multiplication and dot product. 



While the Vertigo compiler does a good job of algebraic simplifi- 
cation, reducing instructions generated and registers used, there is 



ederiv ■.:Id ^ Exp — » Exp 
ederiv v exp = d exp 
where 



d 




= memo nd 




— nd is the non-memoized d 


nd 


e@(LitVec _) 
iVarv' ty)\ v = v 


= zero {typeOf e) 


nd 


= one ty 


nd 


{Var _ ty) 


= one ty 


nd 


{Apply Add [«, v]) 


= d u+dv 


nd 


{Apply Mul [m, v]) 


= u-dv + v-du 


nd 


{Apply Rep [v\) 


= -d v/(v^2) 


nd 


{Apply Sin [u] ) 


= cos u - d u 


nd 


{Apply Cos [u] ) 


= —sin u- d u 


nd 


{Apply Rsq [u] ) 


= —d u ■ rsqrt uj {twoF ■ u) 


nd 


e@ {Apply Exp [u] ) 


= e ■ d u - logTwoF 


nd 


{Apply Log [u]) 


= recip u ■ logTwoF • d u 


nd 


{Apply Negate [u] ) 


= — {du) 


nd 


{Apply MkVec es) 


= vecL {map d es) 


nd 


{Apply {Swizzle s) 


«]) = swizzle s {d u) 


nd 


{Apply Frac [u\) 


= du 


nd 


{Apply Dp3 [u, v]) 


= dp3 u (d v) + dp?> V {d u 


nd 


{Apply Dp4 [u.v]) 


= dpA u {d v) + dpA V {d u 


nd 


{Apply Sit [u.v]) 


= zero {typeOf u) 


nd 


{Apply Sge [u, v]) 


= zero {typeOf u) 


nd 


{Apply Max [u, v]) 


= ifE {u > v) {d u) {d v) 


nd 


{Apply Min [u,v]) 


= ifE {u < v) {d u) {d v) 



Figure 6. Symbolic differentiation 



much room for improvement. 

One improvement would be connecting algebraic simplification 
with register allocation. For instance, the automatic vectorization 
transformation mentioned in Section 7.2 replaces one vector con- 
struction with n of them, and is only beneficial when the vector 
constructions become register swizzles, which are free. More gen- 
erally, it is important to coalesce scalar operations into vectors oper- 
ations where possible, but not at the cost of moving scalars into vec- 
tors at run-time. More sophisticated analysis could allocate scalars 
in the same vector at compile time, when doing so would allow re- 
placing several scalar operations with vector operations. Since the 
same scalar may be used in more than once, there may be competi- 
tion among different potentially vectorizable uses of a given scalar. 

A related issue is the tension between optimization and sharing. 
Consider the definition of if-tiien-else in Section 6.2. Optimizing 
not c could easily break the sharing of part of the computation of c, 
which may more than defeat the optimization. 

Newer generations of graphics vertex processors have more power- 
ful instruction sets, including looping, predicated instructions, con- 
ditional branching, boolean and integer registers. They also have 
larger register sets and program length bounds. These advance- 
ments introduce opportunities and challenges for compilation. New 
pixel processors are also much more general and now worth target- 
ing. Another general challenge is partitioning computation between 
vertex and pixel processors. 

10 Conclusions 

Programmable multiprocessor architectures have finally reached 
the masses in the form of modem graphics cards. This is a great 
opportunity for functional programming, because statelessness nat- 
urally fits the hardware, and because the objects of interest in com- 



puter graphics tend to be functions. This paper describes Vertigo, 
a functional language for 3D shape and shading and an optimizing 
compiler that targets graphics processors. The language has simple, 
transparent semantics in terms of first-class functions. Higher-order 
programming provides powerful abstractions that allow surfaces to 
be composed from simpler components, often of lower dimension. 

Shading languages since Renderman's have had a rather peculiar 
execution model, explained as "instancing", "calling", and iteration 
over light sources. As we have shown, execution can be explained 
simply as curried functions having natural staging: shader-specific 
parameters (instancing), view and surface point information (call- 
ing), and per-light information. 

The Vertigo system runs on Windows, with DirectX 9 
and the .NET framework. It may be downloaded from 

http: //conal .net/Vertigo. 
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